1,425 research outputs found
Recommended from our members
Auditory-based processing of communication sounds
This thesis examines the possible benefits of adapting a biologically-inspired model of human auditory processing as part of a machine-hearing system. Features were generated by an auditory model, and used as input to machine learning systems to determine the content of the sound. Features were generated using the auditory image model (AIM) and were used for speech recognition and audio search. AIM comprises processing to simulate the human cochlea, and a ‘strobed temporal integration’ process which generates a stabilised auditory image (SAI) from the input sound.
The communication sounds which are produced by humans, other animals, and many musical instruments take the form of a pulse-resonance signal: pulses excite resonances in the body, and the resonance following each pulse contains information both about the type of object producing the sound and its size. In the case of humans, vocal tract length (VTL) determines the size properties of the resonance. In the speech recognition experiments, an auditory filterbank was combined with a Gaussian fitting procedure to produce features which are invariant to changes in speaker VTL. These features were compared against standard mel-frequency cepstral coefficients (MFCCs) in a size-invariant syllable recognition task. The VTL-invariant representation was found to produce better results than MFCCs when the system was trained on syllables from simulated talkers of one range of VTLs and tested on those from simulated talkers with a different range of VTLs.
The image stabilisation process of strobed temporal integration was analysed. Based on the properties of the auditory filterbank being used, theoretical constraints were placed on the properties of the dynamic thresholding function used to perform strobe detection. These constraints were used to specify a simple, yet robust, strobe detection algorithm. The syllable recognition system described above was then extended to produce features from profiles of the SAI and tested with the same syllable database as before. For clean speech, performance of the features was comparable to that of those generated from the filterbank output. However when pink noise was added to the stimuli, performance dropped more slowly as a function of signal-to-noise ratio when using the SAI-based AIM features, than when using either the filterbank-based features or the MFCCs, demonstrating the noise-robustness properties of the SAI representation.
The properties of the auditory filterbank in AIM were also analysed. Three models of the cochlea were considered: the static gammatone filterbank, dynamic compressive gammachirp (dcGC) and the pole-zero filter cascade (PZFC). The dcGC and gammatone are standard filterbank models, whereas the PZFC is a filter cascade, which more accurately models signal propagation in the cochlea. However, while the architecture of the filterbanks is different, they have all been successfully fitted to psychophysical masking data from humans. The abilities of the filterbanks to measure pitch strength were assessed, using stimuli which evoke a weak pitch percept in humans, in order to ascertain whether there is any benefit in the use of the more computationally efficient PZFC.
Finally, a complete sound effects search system using auditory features was constructed in collaboration with Google research. Features were computed from the SAI by sampling the SAI space with boxes of different scales. Vector quantization (VQ) was used to convert this multi-scale representation to a sparse code. The ‘passive-aggressive model for image retrieval’ (PAMIR) was used to learn the relationships between dictionary words and these auditory codewords. These auditory sparse codes were compared against sparse codes generated from MFCCs, and the best performance was found when using the auditory features
Wavenet based low rate speech coding
Traditional parametric coding of speech facilitates low rate but provides
poor reconstruction quality because of the inadequacy of the model used. We
describe how a WaveNet generative speech model can be used to generate high
quality speech from the bit stream of a standard parametric coder operating at
2.4 kb/s. We compare this parametric coder with a waveform coder based on the
same generative model and show that approximating the signal waveform incurs a
large rate penalty. Our experiments confirm the high performance of the WaveNet
based coder and show that the speech produced by the system is able to
additionally perform implicit bandwidth extension and does not significantly
impair recognition of the original speaker for the human listener, even when
that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure
The Zwicky Transient Facility: Surveys and Scheduler
We present a novel algorithm for scheduling the observations of time-domain
imaging surveys. Our Integer Linear Programming approach optimizes an observing
plan for an entire night by assigning targets to temporal blocks, enabling
strict control of the number of exposures obtained per field and minimizing
filter changes. A subsequent optimization step minimizes slew times between
each observation. Our optimization metric self-consistently weights
contributions from time-varying airmass, seeing, and sky brightness to maximize
the transient discovery rate. We describe the implementation of this algorithm
on the surveys of the Zwicky Transient Facility and present its on-sky
performance.Comment: Published in PASP Focus Issue on the Zwicky Transient Facility
(https://dx.doi.org/10.1088/1538-3873/ab0c2a). 13 Pages, 11 Figure
A Model for Assessing the Visual Resources of River Basins as an Aid to Making Landuse Planning Decisions
The visual quality of a river basin and its associated properties can be identified, evaluated and integrated into the landscape planning process. The model developed provides a quantitative methodology for determining visual quality on the basis of available Geographic Information System factors. These factors are utilized to develop the preference attributes, COLOR, FORM, TEXTURE and LINE, which are associated with the assessment of visual quality. The preference attributes are then combined through a decision making process into a continuum of DISTINCTIVE, GOOD, AVERAGE and MINIMAL visual quality and is expressed digitally in map format. By providing visual quality information in a digital format it can be treated as a discrete component of the planning process similar to physical, cultural and economic attributes
Low Bit-Rate Speech Coding with VQ-VAE and a WaveNet Decoder
In order to efficiently transmit and store speech signals, speech codecs
create a minimally redundant representation of the input signal which is then
decoded at the receiver with the best possible perceptual quality. In this work
we demonstrate that a neural network architecture based on VQ-VAE with a
WaveNet decoder can be used to perform very low bit-rate speech coding with
high reconstruction quality. A prosody-transparent and speaker-independent
model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits
perceptual quality which is around halfway between the MELP codec at 2.4 kbps
and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality
recorded speech with the test speaker included in the training set, a model
coding speech at 1.6 kbps produces output of similar perceptual quality to that
generated by AMR-WB at 23.05 kbps.Comment: ICASSP 201
Membrane Association and Destabilization by Aggregatibacter Actinomycetemcomitans Leukotoxin Requires Changes in Secondary Structures
Aggregatibacter actinomycetemcomitans is a common inhabitant of the upper aerodigestive tract of humans and non-human primates and is associated with disseminated infections, including lung and brain abscesses, pediatric infective endocarditis in children, and localized aggressive periodontitis. A. actinomycetemcomitans secretes a repeats-in-toxin protein, leukotoxin, which exclusively kills lymphocyte function-associated antigen-1-bearing cells. The toxin\u27s pathological mechanism is not fully understood; however, experimental evidence indicates that it involves the association with and subsequent destabilization of the target cell\u27s plasma membrane. We have long hypothesized that leukotoxin secondary structure is strongly correlated with membrane association and/or destabilization. In this study, we tested this hypothesis by analyzing lipid-induced changes in leukotoxin conformation. Upon incubation of leukotoxin with lipids that favor leukotoxin-membrane association, we observed an increase in leukotoxin α-helical content that was not observed with lipids that favor membrane destabilization. The change in leukotoxin conformation after incubation with these lipids suggests that membrane binding and membrane destabilization have distinct secondary structural requirements, suggesting that they are independent events. These studies thus provide insight into the mechanism of cell damage that leads to disease progression by A. actinomycetemcomitans
Report: The 62nd Annual Caddo Conference and 27th Annual East Texas Archeological Conference, Tyler, Texas, February 28 and 29, 2020
The 62nd Caddo Conference and 27th East Texas Archeological Conference was held at the University Center on the campus of the University of Texas at Tyler on February 28 and 29, 2020. The conference was dedicated to the rebuilding of public facilities at Caddo Mounds State Historic Site. These facilities had been destroyed by a tornado in 2019. The conference organizers were Thomas Guderjan, Colleen Hanratty, Cory Sills, Christy Simmons (University of Texas at Tyler), Keith Eppich (Tyler Junior College), Anthony Souther (Caddo Mounds State Historic Site), Amanda Regnier (Oklahoma Archeological Survey), Mark Walters (Texas Historical Commission Steward). Sponsors included The Center for Social Science Research and Department of Social Sciences, University of Texas at Tyler, Humanities Texas, Kevin Stingley, Arkansas Archeological Survey, Beta Analytic, Inc., Friends of Northeast Texas Archeology, East Texas Archeological Society, Maya Research Program, Tejas Archeology, Tyler Junior College, Gregg County Historical Museum, the American Indian Heritage Day of Texas organization, and the Caddo Nation. Before the formal program began, a preconference gathering was held at ETX Brewing Company at 221 S Broadway Avenue in Tyler on Thursday evening, February 27th. Approximately 250 people participated in the joint conferences
Explorations in anatomy: the remains from Royal London Hospital
This paper considers the faunal remains from recent excavations at the Royal London Hospital. The remains date to the beginning of the 19th century and offer an insight into the life of the hospital's patients and practices of the attached medical school. Many of the animal remains consist of partially dissected skeletons, including the unique finds of Hermann's tortoise (Testudo hermanni) and Cercopithecus monkey. The hospital diet and developments in comparative anatomy are discussed by integrating the results with documentary research. They show that zooarchaeological study of later post-medieval material can significantly enhance our understanding of the exploitation of animals in this perio
Methods for the thematic synthesis of qualitative research in systematic reviews
<p>Abstract</p> <p>Background</p> <p>There is a growing recognition of the value of synthesising qualitative research in the evidence base in order to facilitate effective and appropriate health care. In response to this, methods for undertaking these syntheses are currently being developed. Thematic analysis is a method that is often used to analyse data in primary qualitative research. This paper reports on the use of this type of analysis in systematic reviews to bring together and integrate the findings of multiple qualitative studies.</p> <p>Methods</p> <p>We describe thematic synthesis, outline several steps for its conduct and illustrate the process and outcome of this approach using a completed review of health promotion research. Thematic synthesis has three stages: the coding of text 'line-by-line'; the development of 'descriptive themes'; and the generation of 'analytical themes'. While the development of descriptive themes remains 'close' to the primary studies, the analytical themes represent a stage of interpretation whereby the reviewers 'go beyond' the primary studies and generate new interpretive constructs, explanations or hypotheses. The use of computer software can facilitate this method of synthesis; detailed guidance is given on how this can be achieved.</p> <p>Results</p> <p>We used thematic synthesis to combine the studies of children's views and identified key themes to explore in the intervention studies. Most interventions were based in school and often combined learning about health benefits with 'hands-on' experience. The studies of children's views suggested that fruit and vegetables should be treated in different ways, and that messages should not focus on health warnings. Interventions that were in line with these suggestions tended to be more effective. Thematic synthesis enabled us to stay 'close' to the results of the primary studies, synthesising them in a transparent way, and facilitating the explicit production of new concepts and hypotheses.</p> <p>Conclusion</p> <p>We compare thematic synthesis to other methods for the synthesis of qualitative research, discussing issues of context and rigour. Thematic synthesis is presented as a tried and tested method that preserves an explicit and transparent link between conclusions and the text of primary studies; as such it preserves principles that have traditionally been important to systematic reviewing.</p
- …